BIBSORT
Section: User Commands (1)
Updated: 13 October 1992
Index
Return to Main Contents
NAME
bibsort - sort a BibTeX bibliography file
SYNOPSIS
bibsort [optional sort(1) switches]
< infile >outfile
DESCRIPTION
bibsort
filters a BibTeX bibliography, or bibliography
fragment, on its standard input, printing on
standard output a sorted bibliography.
Sorting is by BibTeX tag name, or by
@String
macro name, and letter case is
ignored in the sorting.
If no command-line switches are provided for
sort(1),
then
-f
is supplied to cause letter case to be ignored.
If you also want to remove duplicate entries, you
could specify the switches
-f -u.
The input stream is conceptually divided into four
parts, any of which may be absent.
-
- 1.
-
Introductory material such as comments, file
headers, and edit logs that are ignored by
BibTeX. No line in this part begins with an
at-sign, ``@''.
- 2.
-
Preamble material delineated by ``@Preamble{'' and
a matching closing ``}'', intended to be processed
by TeX. Normally, there is only one such
entry in a bibliography file, although BibTeX,
and
bibsort,
permit more than one.
- 3.
-
Macro definitions of the form
``@String{...}''. A single macro definition
may span multiple lines, and there are usually
several such definitions.
- 4.
-
Bibliography entries such as ``@Article{...}'',
``@Book{...}'', ``@Proceedings{...}'', and
so on. For
bibsort,
any line that begins with an ``@'' immediately
followed by letters and digits and an open brace
is considered to be such an entry.
The order of these parts is preserved in the
output stream. Part 1 will be unchanged, but
parts 2--4 will be sorted within themselves.
The sort key of ``@Preamble'' entries is their
initial line, of ``@String'' entries, the macro
name, and of all BibTeX entries, the citation tag
between the open curly brace and the trailing
comma.
bibsort
will correctly handle UNIX files with LF line
terminators, as well as IBM PC DOS files with CR
LF line terminators; the essential requirement is
that input lines be delineated by LF characters.
CAVEATS
BibTeX has loose syntactical requirements that
the current simple implementation of
bibsort
does not support. In particular, outer
parentheses may
not
be used in place of braces following ``@keyword''
patterns, nor may there be leading or embedded
whitespace.
If you have such a file, you can use
bibclean(1)
to prettyprint it into a form that
bibsort
can handle successfully.
The user must be aware that sorting a bibliography
is not without peril, for at least these reasons:
-
- 1.
-
BibTeX has a
requirement that entry tags given in
crossref = tag
pairs in a bibliography entry
must
refer to entries defined
later,
rather than earlier, in the bibliography file.
This regrettable implementation limitation of the
current (pre-1.0) BibTeX prevents arbitrary
ordering of entries when
crossref
values are present.
- 2.
-
If the BibTeX file contains interspersed
commentary between ``@keyword{...}'' entries,
this material will be considered part of the
preceding
entry, and will be sorted with it. Leading
commentary is more common, and will be moved
elsewhere in the file.
-
This is normally not a problem for the part 1
material before the ``@Preamble'', since it is kept
together at the beginning of the output stream.
- 3.
-
Some kinds of bibliography files should be kept in
a different order than alphabetically by tags. A
good example is a bibliography file with the
contents of a journal, for which publication order
is likely more suitable.
While a much more sophisticated implementation of
bibsort
could deal with the first point, solving the
second one requires human intelligence and natural
language understanding that computers lack.
bibsort
uses ASCII control characters 001 through 007 for
temporary modifications of the input stream. If
any of these are already present in the input,
they will be altered on output. This is unlikely
to be a problem, because those characters have
neither a printable representation, nor are they
conventionally used to mark line or page
boundaries in text files.
PROGRAMMING NOTES
Some text editors permit application of an
arbitrary filter command to a region of text.
For example, in GNU
emacs(1),
the command
C-u M-x shell-command-on-region,
or equivalently,
C-u M-|,
can be used to run
bibsort
on a region of the buffer that is devoid of cross
references and other material that cannot be
safely sorted.
Some implementations of BibTeX editing support in
GNU
emacs(1)
have a
sort-bibtex-entries
command that is functionally similar to
bibsort.
However, the file size that can be processed
by
emacs(1)
is limited, while
bibsort
can be used on arbitrarily large files, since it
acts as a filter, processing a small amount of
data at a time. The sort stage needs the entire
data stream, but fortunately, the UNIX
sort(1)
command is clever enough to deal with very large
inputs.
The current implementation of
bibsort
follows the UNIX tradition of combining simple
already-available tools. A six-stage pipeline of
egrep(1),
nawk(1),
sort(1),
and
tr(1)
accomplishes the job in one pass with about 70
lines of shell script, 60 lines of which is a
nawk(1)
program for insertion of sort keys.
bibsort
was written and tested on several large
bibliographies in a couple of hours. By contrast,
bibtex(1)
is more than 11 000 lines of code and
documentation, and
bibclean(1)
is about 1500 lines long.
BUGS
bibsort
may fail on some UNIX systems if their
sort(1)
implementations cannot handle very long lines,
because for sorting purposes, each complete
bibliography entry is temporarily folded into a
single line. You may be able to overcome this
problem by adding a
-znnnnn
switch to the
sort(1)
command (passed via the command line to
bibsort)
to increase the maximum line size to some larger
value of
nnnnn
bytes.
SEE ALSO
bibclean(1),
bibtex(1),
egrep(1),
emacs(1),
nawk(1),
sort(1),
tr(1).
AUTHOR
Nelson H. F. Beebe, Ph.D.
Center for Scientific Computing
Department of Mathematics
University of Utah
Salt Lake City, UT 84112
Tel: (801) 581-5254
FAX: (801) 581-4148
Email: <beebe@math.utah.edu>
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- CAVEATS
-
- PROGRAMMING NOTES
-
- BUGS
-
- SEE ALSO
-
- AUTHOR
-
This document was created by
man2html,
using the manual pages.
Time: 01:32:05 GMT, February 01, 2023